Stanfield Systems Incorporated - VIM Toolkit

VAST 2009 Challenge
Challenge 1: -  Badge and Network Traffic

Authors and Affiliations:

Tim Jacobs, Stanfield Systems Incorporated, tjacobs@stanfieldsystems.com [PRIMARY Contact]
Delos Ford, Stanfield Systems Incorporated

Tool(s):

The Visual Information Management (VIM) Toolkit is a tool developed in-house at Stanfield Systems for the purpose of having a toolkit to apply various visuazliation methods in a generic way to various data sources. It can perform a variety of useful and common data processing tasks, and by default can use virtually any common data format; this includes the ability to easily combine data from disparate sources. The processed data can then be visualized in a variety of ways, with the added benefit of being able to have multiple visualizations on one screen. Additionally, the visualizations displayed by the toolkit are all interactive, using various elements of focus-plus-context-oriented design. More information about the tool and Stanfield Systems is available at this link.

Video:

Click here for the full-size video

ANSWERS:


MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.

Traffic.txt

MC1.2:  Characterize the patterns of behavior of suspicious computer use.

The patterns identified for suspicious computer use in this case fell under a few criteria:

1) Uploading data means a large request size and a small response size.
2) Presumably, the location of the server doesn't change.

Given these two criteria, a few sections of the data were identified as being suspicious. Further analysis of the individual transactions ruled out all but the group of uploads to 100.59.151.133. All other destination IPs ended up being ruled out either by being an internal IP, or by not making sense as the spy's point of contact given the prox card data. The prox card data is unable to give an exactly 100% correct picture of what happened, but it can rule things out. Examining the prox card data against all instances of apparent uploads to that IP demonstrates that from the information given, it is impossible to rule out that address as the location to which the data is being sent.

Click For Full Size

In addition to the failure of the prox card data to rule out that set of data transactions, examining the prox card data for the given transactions turned up some odd things... the transactions were frequently being made late in the day, presumably after most people had left, and there were also a few that occurred before the owner of the computer that the data was sent from had even entered the office for the day. A large number of the other transactions also occurred between the computer owner entering and exiting the classified room of the embassy, which means they were not at their desk. Also, not only did the prox card data heavily suggest that the computer owner was absent at the time of transmission, in all cases the computer owner's office mate also appeared to be gone.

The contradictions found by examining the prox card data in conjunction with the ip data clearly show that the server at 100.59.151.133 is the point of contact for the spy within the embassy. Considering that there are 18 transactions, the odds of the observed phenomena occurring by chance are astronomically low.

The visualization used to analyze this data was not anything complex. The data has no real relationships, and the criteria for what to look for is far too vague and intuitive to set up complex analytics. Recognizing that there are a lot of problems out there for which there isn't an easy analytics answer, we developed the visual table-type browsers to facilitate expedited analysis of data of this sort. By themselves, the table browsers are little better than a spreadsheet; however, when multiple tables are placed together and relationships are set up between the two visualizations, the advantages are clear.

Defining the relationship as we did allowed us to expedite the analysis process over a conventional database table view. Clicking on one of the suspicious events in the top view highlights all events related to that ID in the bottom view (prox data), as you can see in the image to the right. If an analyst were to do this in a conventional way, it would require a large number of repeated (and somewhat redundant) database queries tailored to look for specific things, in addition to requiring a lot of switching back and forth between windows. Being able to see the two views simultaneously and have them automatically highlight the entries we're interested in offers a clear advantage over the conventional approach to solving a problem of this nature.